Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
File type detection algorithm based on principal component analysis and K nearest neighbors
YAN Mengdi, QIN Linlin, WU Gang
Journal of Computer Applications    2016, 36 (11): 3161-3164.   DOI: 10.11772/j.issn.1001-9081.2016.11.3161
Abstract587)      PDF (583KB)(481)       Save
In order to solve the problem that using the file suffix and file feature to identify file type may cause a low recognition accuracy rate, a new content-based file-type detection algorithm was proposed, which was based on Principal Component Analysis (PCA) and K Nearest Neighbors ( KNN). Firstly, PCA algorithm was used to reduce the dimension of the sample space. Then by clustering the training samples, each file type was represented by cluster centroids. In order to reduce the error caused by unbalanced training samples, KNN algorithm based on distance weighting was proposed. The experimental result shows that the improved algorithm, in the case of a large number of training samples, can reduce computational complexity, and can maintain a high recognition accuracy rate. This algorithm doesn't depend on the feature of each file, so it can be used more widely.
Reference | Related Articles | Metrics